Goto

Collaborating Authors

 large-scale and high-dimensional data


Fast Low-rank Metric Learning for Large-scale and High-dimensional Data

Neural Information Processing Systems

Low-rank metric learning aims to learn better discrimination of data subject to low-rank constraints. It keeps the intrinsic low-rank structure of datasets and reduces the time cost and memory usage in metric learning. However, it is still a challenge for current methods to handle datasets with both high dimensions and large numbers of samples. To address this issue, we present a novel fast low-rank metric learning (FLRML) method. FLRML casts the low-rank metric learning problem into an unconstrained optimization on the Stiefel manifold, which can be efficiently solved by searching along the descent curves of the manifold. FLRML significantly reduces the complexity and memory usage in optimization, which makes the method scalable to both high dimensions and large numbers of samples. Furthermore, we introduce a mini-batch version of FLRML to make the method scalable to larger datasets which are hard to be loaded and decomposed in limited memory. The outperforming experimental results show that our method is with high accuracy and much faster than the state-of-the-art methods under several benchmarks with large numbers of high-dimensional data.


Reviews: Fast Low-rank Metric Learning for Large-scale and High-dimensional Data

Neural Information Processing Systems

However, it still encounters scalability problem when handling large data. This work gives a new formulation that learns the low-rank cosine similarity metric by embedding the triplet constraints into a matrix to further reduce the complexity and the size of involved matrices. The idea of embedding the evaluation of loss functions into matrices is interesting. For Stiefel manifolds, rather than following the projection and retraction convention, it adopts the optimization algorithm proposed by Wen et al. (Ref. Generally, this paper is well-written with promising results.


Reviews: Fast Low-rank Metric Learning for Large-scale and High-dimensional Data

Neural Information Processing Systems

The reviewers appreciated the computational improvements and the ideas (such as embedding the evaluation of cost into matrices) that went into them. Scores were fairly lukewarm before the rebuttal but the authors did a good job in the rebuttal to address all concerns.


Fast Low-rank Metric Learning for Large-scale and High-dimensional Data

Neural Information Processing Systems

Low-rank metric learning aims to learn better discrimination of data subject to low-rank constraints. It keeps the intrinsic low-rank structure of datasets and reduces the time cost and memory usage in metric learning. However, it is still a challenge for current methods to handle datasets with both high dimensions and large numbers of samples. To address this issue, we present a novel fast low-rank metric learning (FLRML) method. FLRML casts the low-rank metric learning problem into an unconstrained optimization on the Stiefel manifold, which can be efficiently solved by searching along the descent curves of the manifold.


Visualizing the Finer Cluster Structure of Large-Scale and High-Dimensional Data

Liang, Yu, Chaudhuri, Arin, Wang, Haoyu

arXiv.org Machine Learning

Dimension reduction and visualization of high-dimensional data have become very important research topics in many scientific fields because of the rapid growth of data sets with large sample size and/or dimensions. In the literature of dimension reduction and information visualization, linear methods such as principal component analysis (PCA) [7] and classical scaling [17] mainly focus on preserving the most significant structure or maximum variance in data; nonlinear methods such as multidimensional scaling [2], isomap [16], and curvilinear component analysis (CCA) [5] mainly focus on preserving the long or short distances in the high-dimensional space. They generally perform well in preserving the global structure of data but can fail to preserve the local structure. In recent years, the manifold learning methods, such as SNE [6], Laplacian eigenmap [1], LINE [15], LARGEVIS [14], t-SNE [19] [18], and UMAP [10], have gained popularity because of their ability to preserve both the local and some aspects of the global structure of data. These methods generally assume that data lie on a low-dimensional manifold of the high-dimensional input space. They seek to find the manifold that preserves the intrinsic structure of the high-dimensional data. Many of the manifold learning methods suffer from something called the "crowding problem" while preserving local distance of high-dimensional data in low-dimensional space. This means that, if you want to describe small distances in high-dimensional space faithfully, the points with moderate or large distances between them in high-dimensional space are placed too far away from each other in low-dimensional space.


Fast Low-rank Metric Learning for Large-scale and High-dimensional Data

Liu, Han, Han, Zhizhong, Liu, Yu-Shen, Gu, Ming

Neural Information Processing Systems

Low-rank metric learning aims to learn better discrimination of data subject to low-rank constraints. It keeps the intrinsic low-rank structure of datasets and reduces the time cost and memory usage in metric learning. However, it is still a challenge for current methods to handle datasets with both high dimensions and large numbers of samples. To address this issue, we present a novel fast low-rank metric learning (FLRML) method. FLRML casts the low-rank metric learning problem into an unconstrained optimization on the Stiefel manifold, which can be efficiently solved by searching along the descent curves of the manifold.